Genome wide identification and characterization of nodulation related genes in Arachis hypogaea

Nitrogen is an important plant nutrient that has a significant role in crop yield. Hence, to fulfill the needs of sustainable agriculture, it is necessary to improve biological nitrogen fixation in leguminous crops. Nod inducing gene families plays a crucial role in the interaction between rhizobia and legumes, leading to biological nitrogen fixation. However, nod inducing genes identification and characterization has not yet been performed in Arachis hypogaea. In this study, identification and genome-wide analysis of nod inducing genes are performed so that to explore their potential functions in the Arachis hypogaea for the first time. Nod genes were comprehensively analyzed by phylogenetic clustering analysis, gene structure determination, detection of conserved motifs, subcellular localization, conserved motifs, cis-acting elements and promoter region analysis. This study identified 42 Nod inducing genes in Arachis hypogaea, their sequences were submitted to NCBI and accession numbers were obtained. Potential involvement of these genes in biological nitrogen fixation has been unraveled, such as, phylogenetic analysis revealed that nod inducing genes evolved independently in Arachis hypogaea, the amino acid structures exhibited 20 highly conserved motifs, the proteins are present at different locations in cells and the gene structures revealed that all the genes are full-length genes with upstream intronic regions. Further, the promoter analysis determined a large number of cis-regulatory elements involved in nodulation. Moreover, this study not only provides identification and characterization of genes underlying developmental and functional stages of nodulation and biological nitrogen fixation but also lays the foundation for further revelation of nod inducing gene family. Besides, identification and structural analysis of these genes in Arachis hypogaea may provide a theoretical basis for the study of evolutionary relationships in future analysis.


Introduction
Gene expression regulation at a transcriptional and post-transcriptional levels might influence and control important biological activities and processes such as perception of compounds, a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 considering the importance of nodulation genes and lack of information about gene families in this process, we aimed to conduct a genome-wide identification and characterization of the NIG family genes in Arachis hypogaea.

Identification and annotation of nodulation genes in Arachis hypogaea
To perform comprehensive analysis of Nod genes in Arachis hypogaea, different nodulation genes (NFP, SYMRK, NUP85, NUP133, CCaMK, CYCLOPS, NSP1, NIN, ERN1, Nod, Nol) were identified in model legumes [18]. A local BLASTp algorithm search was used to retrieve homologous FASTA sequences of these genes in NCBI (https://www.ncbi.nlm.nih.gov/). FASTA sequences retrieved from NCBI were used as queries to carry out BLASTn searches in peanut genomic database. The sequences was selected from PeanutBase (https://www. peanutbase.org/) with a cutoff e-value of 0e. To verify the reliability of results and to confirm each predicted sequences, all gene sequences were checked in the PFAM database (http:// pfam.xfam.org/) [22] for domains. The full-length genes, half-length genes and genes having no domains in ORF are manually identified and then all the redundant sequences were removed. A total of 42 genes were designated as Nodulation related genes in Arachis hypoagea to perform further insilico analysis. The ExPASy translate tool (http://www.expasy.ch/tools/ dna.html) was used to deduce the amino acid or open reading frames of Nod inducing genes. Then chemical and biophysical parameters of Nod inducing 42 genes in Arachis hypogaea were predicted by ProtParam (http://expasy.org/tools/protparam.html) available at ExPASy by using primary sequences of genes [23]. These properties were predicted to explore characteristics of genes e.g., protein length (aa), coding sequence (CDS), gene length (bp), molecular weight (MW), grand average of hydropathicity (GRAVY), isoelectric point (pI), instability index and aliphatic index (AI).

Sequence alignment and phylogeny inference
The sequences of nod inducing genes were retrieved from PeanutBase (https://www. peanutbase.org) and multiple sequences of full gene length were aligned by using MEGA 10.2.4 tool (https://www.megasoftware.net/) at default setting by using "align by muscle". An unrooted tree was also constructed using MEGA 10.2.4 tool with neighbor joining (NJ) algorithm. The bootstrap replicates of 1000 with 50% cutoff values were used to test the reliability of the tree and then the tree was visualized.

Gene structure and conserved motif distribution analysis
Structural information of Nod inducing genes i.e., intron/exon patterns were predicted by using an online Gene Structure Display Server (http://gsds.gao-lab.org) [24,25]. Gene structures were predicted by using coding regions (CDS) of genes in BED file format. The MEME (https://meme-suite.org/meme/) program was used for significant functional and conserved protein motifs prediction. This analysis was performed by adjustment of parameters as the optimum motif width: 3 residues, number of unique motifs: 20 and distribution of motifs: and any number of repetitions [26].
The segmental duplication was then represented by drawing the red lines between two duplicated genes.

Subcellular localization prediction and sequence logos analysis
To predict and better understand subcellular localization of NIG family proteins for several functions, all predicted 42 DNA FASTA sequences of Arachis hypogaea were translated to protein sequences via the online ExPASy translate tool (https://web.expasy.org/translate/). Plant-mPLoc (http://www.csbio.sjtu.edu.cn/bioinf/plant-multi/) was used to identify their subcellular locations [29]. Plant-mPLoc predictor allows localization of plant proteins at different 12 targets i.e., Chloroplast, Cell wall, Endoplasmic reticulum, Cytoplasm, Extracellular, Mitochondria, Golgi apparatus, Nucleus, Plasma membrane, Peroxisome, Plastid, Plasma membrane and Vacuole. Furthermore, sequence logos help to determine the amino acids that are conserved or non-conserved among all the genes. Hence for sequence logos analysis, proteins of 42 NIG family genes were aligned by CLUSTALW (https://www.genome.jp/tools-bin/ clustalw) [30] and then sequence logos were generated by WebLogo (https://weblogo.berkeley. edu/logo.cgi) [31].

Retrieval of promoter regions and Cis-acting elements analysis
Promoter sequence regions of Arachis hypogaea, 1.5 kb upstream of the translation site, was downloaded from the peanut database. The PlantCare tool (http://bioinformatics.psb.ugent. be/webtools/plantcare/html/) [32] was used for the investigation of cis-acting elements in the promotes of NIG genes of Arachis hypogaea and classified based on their known functions.

Identification and annotation of NIG gene family in Arachis hypogaea
Previous studies identified many nodulation genes in legume plants [17,18]. Genome-wide studies of nodulation genes were conducted in many legume plants by exploring its publicly available data [33]. The current study identified a total of novel 42 NOD inducing genes in Arachis hypogaea (S1 Text) by homology study. All the genes identified in NIG gene family, that encoded proteins having the nodulation function, were analyzed for the nodulation process according to their location on the chromosome. The ExPASy translate tool was used to deduce the amino acid or open reading frames of Nod inducing genes (S2 Text). Based on PFAM analysis, the study determined whether each candidate gene contained the conserved domain, and the presence of related domains, respectively in the PFAM database was ultimately confirmed. Based on sequence identity with the functionally characterized NIG family, the individual names of all genes were given. The 42 full-length identified sequences were submitted to NCBI and their accession numbers were obtained. The gene names of the NIG family, their accession numbers, length of the coding sequences, and characteristics of these proteins are present in (Table 1).
The biochemical properties and physical parameters of Nod inducing 42 genes in Arachis hypogaea were predicted by ProtParam available at ExPASy by using primary sequences of genes. These properties were predicted to explore important characteristics of genes e.g., coding sequence (CDS), protein length (aa), isoelectric point (pI), molecular weight (MW), aliphatic index (AI), extinction coefficients (EC) by assuming all pairs of Cys residues form cystines, grand average of hydropathicity (GRAVY) and their estimated half-life. The full-

PLOS ONE
Nod inducing genes in A. hypogaea length coding sequences of the Nod inducing genes ranged from 543 bp (AhNKLM1) to 3996 bp (AhNNup3) and their putative proteins contained between 343 and 1331 amino acid (aa) residues, with an average of~631 aa. The theoretical pI ranged from 4.82 (AhNNLC1, AhNNLC2) to 9.08 (AhNSur6), and molecular weights ranged from 42957.07 (AhNSur10) to 149018.71 (AhNNup4). In congruence with the features, the genome-wide studies also detected significant variation in an aliphatic index, extinction coefficients and GRAVY inferring a high degree of complexity and functional diversification among the NIG family of Arachis hypogea (Table 1).

Evolutionary analysis of NIG gene family
To assess the evolutionary relationship, sequences of all the identified genes of NIG family (Nod Inducing Gene-family) were aligned and a phylogenetic tree was constructed using neighbor-joining method (Fig 1). All relationships were made by full-gene multiple sequence alignment of 42 Nod inducing genes and all ambiguous positions were removed for each sequence pair (pairwise deletion option). The phylogenetic analysis of these gene sequence resulted in the well-resolved tree, showing that the NIG family in Arachis hypogaea are classified into three major sub-families referred to as NIG-a, NIG-b and NIG-c. NIG-a has 14 members comprising of 5 domains while NIG-b having the maximum number of 16 members comprising of four domains. In addition to these families, sub-family NIG-c have the minimum number of 12 members comprising of four domains. Whereas all these three sub-families indicate that they are derived from a common ancestor with close homology. Based on their domains, NIG family represent functional similarities and have a close phylogenetic relationship. Furthermore, sub-families were divided into twelve groups, namely NIG-a1, NIG-a2, NIG-a3, NIG-a4, NIG-b1, NIG-b2, NIG-b3, NIG-b4, NIG-c1, NIG-c2, NIG-c3 and NIG-c4, on the basis of structural and functional similarities of genes. For instance, the genes that were more related to each other regarding structure and function were appeared in the same group. Thus, based on closed phylogenetic reconstruction, we speculate that these sub-families demonstrate biological nitrogen fixation. Conserved domain analysis revealed that groups of subfamily NIG-a were clustered together in a phylogenetic tree with NIG-a1, NIG-a2, NIG-a3 and NIG-a4. While groups of sub-family NIG-b contain NIG-b1, NIG-b2, NIG-b3 and NIG-b4 showed a close relationship. Meanwhile, sub-family NIG-c reported NIG-c1, NIG-c2, NIG-c3 and NIG-c4 groups have a relationship. As 6 members of NIG-a1 contains AhNPR1, AhNPR2, AhNPR3, AhNPR4, AhNPR5 and AhNPR6 genes having RWP-RK and PB1 domain, that are reported to be involved in nodule perception as they are involved in nitrate response and nodulation. However, this insilico study provided the further characterization of 4 members (AhNKEF1, AhNKEF2, AhNKEF3 and AhNKEF4) of the same sub-family NIG-a i.e., NIG-a2 based on their domains that are predicted to function in rhizobium infection. Moreover, the 2 members within NIG-a3 i.e., AhNNLC1 and AhNNLC2 has Nsp1-like-C-terminal region and thereby seeming to participate in infection and normal rhizobia bacteroid formation in the nodule. Two members of NIG-a4 group consist of AhNKLM1 and AhNKLM2 genes which contain protein kinase domain and LysM domain. Subsequently, 2 members, AhNMTL1 and AhNMTL2 belongs to NIG-b1 group and we identified that they can function in root nodule symbiosis. The two members in NIG-b3 contains the AhNNup3 and AhNNup4 gene having Nup85 Nucleoporin domain which might be an important part of nucleopore subcomplex and has been demonstrated to be involved in rhizobia colonization. Most of the groups with the SurNod19 domain belongs to NIG-b sub-family. As NIG-b2 has 6 members (AhNSur4, AhN-Sur5, AhNSur6, AhNSur12, AhNSur13 and AhNSur14) while 6 members of NIG-b4 contains AhNSur1, AhNSur2, AhNSur15, AhNSur16, AhNSur17 and AhNSur18 genes. Besides these two, 6 members of NIG-c1 (AhNSur3, AhNSur7, AhNSur8, AhNSur9, AhNSur10, AhNSur11) has the same domains and it was interesting to study that the NIG-b2, NIG-b4 and NIG-c1 are functionally same, which has been documented to participate in nodule development. The NIG-c has functionally characterized groups suggesting that most of the groups of this subfamily have similar functions as groups of NIG-a and NIG-b. While the NIG-c2 has functionally similar 2 members (AhNNup1 and AhNNup2) as NIG-c3. Members as AhNKEF5 and AhNKEF6 belongs to NIG-c3 and has functional similarity to NIG-a2 i.e., according to their domains they are characterized as rhizobial infection and nodulation causing genes. The 2 members of NIG-c4 (AhNKLM3 and AhNKTyr) include LysM, protein kinase and Pkinase_-Tyr domains which have been reported to be involved in rhizobial nodule symbiosis. Hence, these studies revealed evolutionary relationships of Nod inducing genes in NIG family and the domains of these genes are implicated in causing rhizobial infection and nodulation which leads to biological nitrogen.

Conserved motif analysis and intron/exon organization of NIG gene family
Conserved motifs of NIG family genes were predicted by utilizing the online MEME tool. However, the conserved motif prediction is essential to further gain an understanding of diversification and structural characteristics of genes. The detailed information about the 42 genes of NIG family including name, width and best possible matches is presented by this study. The current study found 27 conserved motifs with different amino acids ranges from 15-50. Identified motifs and their schematic distribution in all sub-groups (Fig 2). In addition, the sequence, sites and width for each conserved motif ( Table 2). The number of motifs were comparatively greater in NIG-b sub-family than NIG-a and NIG-c sub-families. Motif 1 was found in all Nodulin stress up-regulating genes of groups NIG-b2, NIG-b4 and NIG-c1 except for AhNSur4. Similarly, motif 4 was exclusively present in all Nodulin stress up-regulating genes except for AhNSur6. Motif 2, 7 and 11 were unique to all genes of NIG-b2, NIG-b4 and NIG-c1 groups. Motif 5 and 8 were encountered in NIG-b2 and NIG-b4 groups. Motifs 6 is present in 10 (AhNSur1,2,6,12,13,14,15,16,17 and 18) out of 18 genes of AhNSur group, suggesting that these motifs might have conserved functions. Motif 3, 16, 21, 24 were detected only within some genes of AhNSur and AhNPR groups. Motif 9, 12, 20 and 14 were found in NIG-a1 group. Motif 25 was also encountered in NIG-a1 group except for AhNPR1 and AhNPR2 genes. Motif 13 and 27 were documented for all genes of NIG-a1 and NIG-b1. Motif 15, 17, 19 and 23 were unique to some genes of NIG-a1, NIG-a2, NIG-a4 and NIG-b1 groups, suggesting that these genes may be derived from a common ancestor. Motif 22 was only encountered in all AhNKEF genes. Motif 18 and 26 were observed within most genes of groups NIG-a2, NIGa-4, NIG-b1, and NIG-c2, NIG-c3 and NIG-c4. The analysis also suggests that motif 10 is conserved to most of the NIG family (present at 30 different sites; having 21 amino acids sequence), suggesting that it could be a signature motif associated with NIG family genes. While the non-conservation between motifs of NIG family predicts groups specificity, which could be related to diversifications in their functions. Furthermore, analysis revealed the conserved domains in genes of NIG family are positioned at similar locations and the results demonstrate that majority of the genes of this family are closely related and they have common compositions of motifs which represents that the gene structures are highly conserved.
Structural diversity and functional characterization of NIG family genes are important to gain information about the evolution of this gene family. Hence to analyze the exon/intron structure of NIG family genes BED files were used, the analysis suggested that all are full-length genes i.e., the domains for nodulation lie inside the CDS regions. However, there are intronic regions in these genes and these intronic regions are upstream of the CDS regions. While the genes AhNSur6, AhNSur12, AhNSur14 and AhNKLM3 contained no intronic region (Fig 3). These observations indicated that each group in NIG family, shows maximum common gene structural conservation.

PLOS ONE
Nod inducing genes in A. hypogaea genome, we analyzed the duplication mechanisms during the evolution of this gene family here. By analyzing the sequence coverage and similarities of 42 NIG genes, we identified 7 pairs of genes (AhNKEF5/AhNKEF6, AhNSur9/AhNSur10, AhNNCL1/AhNNCL2, AhNMTL1/ AhNMTL2, AhNNup3/AhNNup4, AhNSur1/AhNSur2 and AhNSur15/AhNSur16) experienced segmental duplication. While there was no tandem and whole-genome duplication event.
These observations were suggested that segmental duplication contributed largely to the expansion of NIG family members in Arachis hypogaea. These segmental duplication events were presented by constructing red lines on the Fig 4. The Arachis hypogaea chromosomes (Chr02, Chr10, Chr12, Chr13, Chr16, and Chr20) also have nodulation inducing overlapping genes. Ka and Ks values for these duplicated gene pairs were calculated by using the KaKs calculator and then obtained information was used to determine selective evolutionary pressure. 6 out of 7 gene pairs had Ka/Ks<1 which indicated purification selection while one gene pair had Ka/Ka>1 which implies positive selection during evolution. In addition, segmental duplication events of 7 gene pairs were predicted to occurred between 0.286 and 41.607 million years ago (Table 3).

NIG family protein subcellular localization and sequence logos analysis
The protein localization by Plant-mPLoc analysis predicted that 37 proteins of NIG family are located at single positions. For instance, these single location proteins are present in the nucleus, chloroplast and cytoplasm. While there are 5 proteins found in multiple positions. For instance, the proteins encoded by AhNNup3 and AhNNup4 genes are located in chloroplast-nucleus, while AhNNLC1 and AhNNLC2 encoded proteins are located in cell membranenucleus and the protein encoded by AhNKLM2 gene is present in cell membrane-cytoplasmnucleus (Fig 5).
The sequence logos of NIG family proteins could help to evaluate and discover the pattern of amino acid conservation in all 42 genes. Sequence logos of aligned amino acid residues of

PLOS ONE
Nod inducing genes in A. hypogaea nodulation genes in Arachis hypogaea were generated to determine whether the NIG family proteins were conserved in all 42 genes throughout evolution (S1 Fig). The analysis showed that the protein sequences had moderate to high-level conservation at many different positions across N to C terminal.

Retrieval of promoter regions and cis-acting elements analysis
To evaluate the transcriptional regulation of NIG gene family in response to different environmental conditions, promoters and cis-acting regulatory elements of candidate genes have been identified. Hence, 1500 bp upstream of the start codon was selected for identification of

PLOS ONE
Nod inducing genes in A. hypogaea putative cis-acting regulatory elements (CAREs) (S3 Text). The study revealed a total of 55 kinds of cis-acting regulatory elements across all Arachis hypogaea nodulation related genes. However, the CAREs length varies from 5-13 bp. Fig 6 represents the frequency of occurrence of each cis-acting element in each gene. The cis-acting elements of all the candidate genes were grouped into functional categories as show in Fig 7. Among these identified cis-acting elements, common elements, such as TATA-box and CAAT-box, were present in abundance and were shared by all NIG genes. In addition, other identified cis-acting regulatory elements were associated with environmental stress-related factors, hormonal regulation and cellular development. For instance, among all the identified cis-acting elements, the motifs such as ACE, GT1-motif, AE-box, GCC-box, ATC-motif, I-box, Box II, Box III, Box 4, chs-CMA1a, chs-CMA2a, CCAAT-box, MBS, LTR, ABRE, O 2 site, G-box, GATA motif, GA motif and LAMP-element etc. are associated with stress responses like low-temperature, light, oxidation, defense, wound, drought, anaerobic induction and anoxic response respectively. After, the stress-responsive motifs, motifs involved in hormonal regulation were found to be the second largest in number. Motifs such as, CGTCA and TGACG (methyl-jasmonate), ABRE (abscisic acid), TCA element (salicylic acid), AuxRR-core (auxin responsiveness element) were considered as hormonal responsive elements in NIG genes. Similarly, the elements involved in cellular development are relatively fewer in number than hormonal and stress-sensitive elements. These cis-acting elements include CAT box, O 2 site, GCN4_motif, circadian, MBSI and ARE etc. CAT box is involved in meristem expression responses. Zein metabolism is regulated by O 2 site. GCN4_motif is involved in endosperm expression. The circadian motif plays an important role in controlling circadian rhythms. MBSI has a role in the regulation of flavonoid biosynthetic genes. And anaerobic induction is carried out by ARE motif. Furthermore, the analysis revealed that most of the NIG genes belonging to different groups contain a number of binding sites for a major class of plant transcription factor genes. For instance, CCAAT-box has a MYBHv1 binding site. The detailed information about the cis-elements, their sequences and functions are shown in S1 Table. However, the number of occurrences of each cis-element was predicted. All NIG genes contained 512 stress-responsive elements, among them light-responsive elements made up 29%, which represents that light-responsive elements are most abundantly found in NIG genes. Similarly, hormonal regulation is coordinated by 237 motifs and there are at least 26 element to control cellular as shown in Fig 8.

Discussion
Many legume plants respond to rhizobium to form a symbiotic relationship and develop a structure on their roots, known as a nodule. A nodule is an organ in which rhizobium converts atmospheric N 2 into ammonia by a unique process [35]. However, this type of symbiotic relationship is triggered by the release of flavonoids from plants which might act as chemo-attractant to rhizobium spp., the Nod genes and a signal which is generated in plants after the perception of Nod-Factors [36]. Nod factors activate several nodulation genes in the nitrogen-

PLOS ONE
fixing plants, however, the characterization of Nod genes was reported in many legumes [7,12,17]. For instance, nodulation factor receptor genes like NFR1 and NFR5 are discovered in L. japonicus [12,13] and LYK3 and NFP were discovered in M. truncatula1 [10]. Furthermore, in L. japonicus and M. truncatula, the receptor-like kinases SYMRK and DMI2 serve as coreceptors for NF signaling [17]. It was crucial to completely predict and understand the role of nodulation genes in Arachis hypogaea. Arachis hypogaea is 3 rd largest class of legumes and it plays an important role in the world agriculture economy [37]. In this study, we analyzed the evolutionary relationship of nodulation inducing genes (NIG) family in Arachis hypogaea and categorized the physiochemical properties, gene structure, motif analysis, protein localization, chromosomal location, duplication of genes and their selection pressure of NIG family genes. The nodulation genes in allotetraploid Arachis hypogaea [38] were analyzed to understand the function of NIG family genes in flavonoid exudation, nodule development and biological nitrogen fixation.
The phylogenetic tree divided NIG family genes into three sub-families NIG-a, NIG-b and NIG-c, where sub-family NIG-b was the largest with 16 members while the sub-family NIG-c was the smallest with 12 members while previously identified MtGRAS family divided nodulation genes into eight subfamilies [39] and A.evenia LysM-RLK nodulation gene family contained 18 members [40]. The role of these genes in nodulation was also supported by conserved amino acid residue analysis of Arachis hypogaea genes. These results showed that sequence logos were conserved at many regions among all the genes, exhibiting that NIG family genes remained conserved throughout the process of evolution. Moreover, in the

PLOS ONE
Nod inducing genes in A. hypogaea phylogenetic tree the NIG genes which have a very close evolutionary relationship was clustered together, suggesting that they may play related functions in plant nodulation.
Further analysis revealed that the gene length of NIG family genes ranges from 543 bp to 3996 bp. It has been predicted that genes belonging to NIG family share similar gene structures and protein motif distribution as well as few conserved motifs which indicated that NIG gene family is more conserved. Here we also predicted that the proteins encoded by these genes have almost similar motifs hence they are associated with the specific functions of symbiotic relationships. For instance L. japonicus and M. truncatula has Nucleoporin [41,42], RWP-RK in Arachis duranensis and Arachis ipaensis [43], EF-hand domain in soybean [44], LRR in L. japonicus [45] were found to be involved in nodulation.
AhNup3 is the largest assumed protein (149014.72 Da) while AhNSur4 is the gene whose molecular weight was the smallest (38749.59 Da). In addition, the proteins were found to be located at different cell organelles i.e., nucleus, chloroplast and cell-membrane. These proteins were either single or multilocus proteins.
Genes derived from the same ancestral genes are called orthologs they have same biological functions, while the genes resulted from single genes by duplication event are called paralogs. Paralogs encode proteins with dissimilar functions [25,46,47]. The duplicated genes are mainly participating in paralogous genes formation of families. Besides, the uneven distribution of NIG family genes on 20 Arachis hypogaea chromosomes indicated that during evolution NIG genes experienced duplication. Further, in MtGRAS gene family of model legume M. truncatula 17 genes were duplicated [39] while our investigations predicted seven duplicated gene pairs in NIG family. Furthermore, these genes are originated from segmental duplication events and none of the genes originated from tandem duplication. Hence segmental duplication contributed to functional divergence and gene family expansion.
Cis-regulatory elements present in the promoter regions of genes are considered to be responsible for controlling the environmental and developmental regulation of gene expression. We determined different types and numbers of cis-acting elements in NIG family genes promoters. Many genes contained MBS, ACE, Box 4, MRE, CCAAT-box, GATA-motif, AT1-motif, I-box, ACA-motif, ATC-motif, Box II, AAAC-motif, WUN-motif, GA-motif, GCmotif etc. And these elements might responsible to perform different functions of the gene families under this study. For instance, stress-related elements were in abundance which might play an important role in adapting to external environmental stresses such as, low temperature, drought, light and defense. Cis-elements controlling hormonal regulation was second the largest in number. In addition, few elements might play role in cellular development. Furthermore, different numbers of such cis-acting elements were also discovered in wild species (A. duranensis and A. ipaensis), representing, distinct functions of these genes, vital for plant growth and development [43].

Conclusion
The function of many nodulation related genes of legumes has been determined, but the functions of nodulation genes in Arachis hypogaea are still not demonstrated. NIG family genes were identified and divided according to evolutionary lineages into three sub-families. The expansion of NIG family might be the result of segmental duplication. Whereas, biophysical properties depict that NIG family proteins are located at distinctive cellular compartments, which represents that the proteins or enzymes are correlated with specific functions. In addition, these genes have conserved motifs and gene structures which suggests that the genes have functional similarities. All 42 genes are unevenly mapped on the chromosome. Cis-element analysis revealed the role of NIG family genes display their pivotal role in stress responses, hormonal regulation and cellular development. Hence, the results provided by genomic and bioinformatics analysis of NIG family genes provides a piece of valuable information about the phylogenetic relationship, structure, and function of these family members. These results could help in developmental research and genetic improvement of nodulation genes in Arachis hypogaea and other valuable nitrogen-fixing plants to improve biological nitrogen fixation. However, functional characterization of genes is needed to confirm their role in the nitrogen fixation and their use in future research programs.